Mon. Not. R. Astron. Soc. 000, [Tp5] (2010) Printed 21 December 2012 (MN MgX style file v2.2) 



Kernel PCA for type la supernovae photometric 
classification 



E. E. 0. Ishida 1 ' 2 * and R. S. de Souza 3 ^ 2 

1 IAG, Universidade de Sao Paulo, Rua do Matao 1226, Cidade Universitdria, CEP 05508-900, Sao Paulo, SP, Brazil 
2 Max- Planck- Institut fur Astrophysik, Karl-Schwarzschild-Str. 1, D-85748 Garching, Germany 
3 Korea Astronomy and Space Science Institute, Daejeon, 305-348, Republic of Korea 



Accepted — Received - 



ABSTRACT 

The problem of supernova photometric identification will be extremely important for 
large surveys in the next decade. In this work, we propose the use of Kernel Princi- 
pal Component Analysis (KPCA) combined with k = 1 nearest neighbour algorithm 
(INN) as a framework for supernovae (SNe) photometric classification. The method 
does not rely on information about redshift or local environmental variables, so it is 
less sensitive to bias than its template fitting counterparts. The classification is en- 
tirely based on information within the spectroscopic confirmed sample and each new 
light curve is classified one at a time. This allows us to update the principal compo- 
nent (PC) parameter space if a new spectroscopic light curve is available while also 
avoids the need of re-determining it for each individual new classification. We applied 
the method to different instances of the Supernova Photometric Classification Chal- 
lenge (SNPCC) data set. Our method provide good purity results in all data sample 
analysed, when SNR>5. As a consequence, we can state that if a sample as the post- 
SNPCC was available today, we would be able to classify « 15% of the initial data set 
with purity > 90% (D7+SNR3). Results from the original SNPCC sample, reported as 
a function of redshift, show that our method provides high purity (up to ~ 97%), spe- 
cially in the range of 0.2 ^ z < 0.4, when compared to results from the SNPCC, while 
maintaining a moderate figure of merit (sa 0.25). This makes our algorithm ideal for 
a first approach to an unlabelled data set or to be used as a complement in increasing 
the training sample for other algorithms. We also present results for SNe photometric 
classification using only pre-maximum epochs, obtaining 63% purity and 77% suc- 
cessful classification rates (SNR^5). In a tougher scenario, considering only SNe with 
MLCS2k2 fit probability >0.1, we demonstrate that KPCA+1NN is able to improve 
the classification results up to > 95% (SNR>3) purity without the need of redshift 
information. Results are sensitive to the information contained in each light curve, as 
a consequence, higher quality data points lead to higher successful classification rates. 
The method is flexible enough to be applied to other astrophysical transients, as long 
as a training and a test sample are provided. 

Key words: supernovae: general; methods: statistical; methods: data analysis 



1 INTRODUCTION 



Since its discovery ( |Riess et al. 1998 Perlmutter et al. 
19991, dark energy (DE) has become a big challenge in 



theoretical physics and cosmology. In order to improve our 
understanding about its nature, multiple observations are 
used to add better constraints over DE characteristics (e.g., 
|Mantz et aT]|2l)i"o| |Blake et aLl|201~T| |Plionis et aLl|201~Tj ). 
In special, large samples of type la supernovae (SNe la) are 
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being used to measure luminosity distances as a function 
of redshift in order to constraint cosmological parameters 



(e.g.,|Kessler et al.|2009| |Ishida fc de Souza||2011| |Benitez- 
|Herrera et al.||2012| |Conley et al.||2011[ ). As part of the ef- 
forts towards understanding DE, we expect many thousands 
of SNe candidates from large photometric surveys, such as 
the Large S ynoptic Survey Telesc ope (LSST) ( |Tyson||2002| ), 
SkyMapper (|Schmidt et al.|2005 \ and the Dark Energy Sur- 
vey (DES) ( |Wester fc Dark Energy Survey Collaboration 
|2005[ ). However, with rapidly increasing available data, it is 
already impracticable to provide spectroscopical confirma- 
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tion for all potential SNe la discovered in large field imag- 
ing surveys. After a great effort in allocating their resources 
for spectroscopic follow-up, the SuperNova Legacy Survey 
(SNLS) (|Astier et al.|2006| ) and the Sloan Digital Sky Survey 
(SPSS) ( [York et al.|2000^ were able to provide confirmation 
for almost half of their light-curves. These constitute the ma- 
jor SNe la samples currently available, but it is very unlikely 
that their power of spectroscopic follow-up will continue to 
increase as it did in the last decade ( Kessler et al.|2010 1 . In 
this context, we do not have much choice left other than de- 
velop (or adapt) statistical and computational tools which 
allow us to perform classification on photometric data alone. 
Beyond that, such tools should ideally provide a quick and 
flexible framework, where information from new data may 
be smoothly added in the pipeline. 

Trying to solve this puzzle, in the recent years a good 
diversity of techniques were applied to the problem of SNe 



large sensitivity to the representativeness between training 
and test samples. 



photometric classification (Poznanski et al 
& Crotts||2006| ISullivan et al 



2006 



Poznanski et al.|| 2007 
Kuznetsova fe Connolly||2007| |Kunz et al.||2007[ |Sako et al 
2008 1 |Rodney fc Tonry||2009| [Gong et al.||2010| |Palck et al 
20101 



2002 



Most of them use the idea of template fitting, so the 
classification is estimated by comparison between the unla- 
beled SN and a set of confirmed light curve templates. The 
method starts with the hypothesis that the new, unlabelled, 
light curve belongs to one of the categories in the template 
sample. The procedure then continues to determine which 
category best resembles the characteristic of this new object. 
It produced good results ( |Sako et al.|2008| , but its final clas- 
sification rates are highly sensitive to the characteristics of 
the template sample. 



To overcome such difficulty, Newling et al. (2011 1; Sako 
|et al. | ( |2011[ ) describe different techniques which address a 
posterior probability to each classification output. These al- 
gorithms produce not a specific type for each SN, but a 
probability of belonging to each one of the template classes. 
Such an improvement allow the user to impose selection cuts 
on posterior probability and, for example, use for cosmology 
only those SNe with a high probability of being la. 

Another interesting approach proposed by |Kunz et al.| 



( 2007 1, and further developed by Newling et al. ( 2012 ), takes 



a somewhat different path. Instead of separating between la 
and non-la before the cosmological analysis, they use all the 
available data. However, the influence of each data point 
in determining the cosmological parameters is weighted ac- 
cording to their posterior probability (obtained from some 
classifier like that of Sako et al. (2011), for example). The 
method was able to identify the fiducial cosmological pa- 
rameters in a simulated data set, although some bias still 
remains and worth further investigation. 

Following a different line of thought, Richards et al.| 



(20121 (hereafter R2012) proposes the use of diffusion maps 
to translate each light curve into a low dimensional parame- 
ter space. Such space is constructed using the entire sample 
and, after a suitable representation is found, the label of the 
spectroscopic sample is revealed. In the final step a random 
forest classification algorithm is used to assign a label to the 
photometric light curves, based on their low dimensional dis- 
tribution when compared to the one from the spectroscop- 
ically confirmed SNe. Results were comparable to template 
fitting methods in a simulated data set, but it also showed 



More recently, Karpenka et al. (20121 presented a two- 



step algorithm where each light curve in the spectroscopic 
sample is first fitted to a parametric function. The values of 
parameters found are subsequently used in training a neu- 
ral network (NN) algorithm. The NN is then applied to the 
photometric sample and, for each light curve, it returns the 
probability of being a la. Their classification results are over- 
all not depending on redshift distribution and, as other anal- 
ysis cited before, can be vary significantly depending on the 
training sample used. 

In order to better understand and compare the state of 
art of photometric classification techniques, |Kessler et al.| 
(20101 released the SuperNova Photometric Classification 



Challenge (hereafter, SNPCC). It consisted of a blind sam- 
ple of ~20.000 SNe light curves, generated using the Super 
Nova ANA lysi^] (SNANA) light curve simulator (jKessler 



et al.|2009 |, and designed to mimic data from the DES. Ap- 



proximately 1000 of these were given with labels, so to repre- 
sent a spectroscopically confirmed sub-sample. The partici- 
pants were offered 2 instances of the data, with and without 
the host galaxy photometric redshift (photo-z). Around a 
dozen entries were submitted to the Challenge and, although 
none of them obtained an outstanding result when compared 
to others, it provided a clear picture of what can be done 
currently and what we should require from future surveys 
in order to improve photometric classifications. There was 
also an instance of the data containing only observations be- 
fore maximum, which aimed at choosing potential spectro- 
scopic follow-up candidates. However, this data set did not 
received replies from the participants (Kes sler et al.|[2~010[ ). 
After the Challenge, the organizers released an updated ver- 
sion of the data, including all labels, bug fixes and other im- 
provements found necessary during the co mpetitiorp] The 
works of|Newling et al.| (|2011|), R2012 and |Karpenka et al~ 



( 2012 1 present detailed results from applying their algorithm 
to this post-SNPCCQdata. 

Given the stimulating activity in the field of SNe photo- 
metric classification, and the urgency with which the prob- 
lem imposes itself, our purpose here is to present an alterna- 
tive method which optimizes purity in the final SNe la sam- 
ple, in order to provide a statistically significant number of 
photometrically classified SNe la for cosmological analysis. 
Our algorithm uses a machine learning approach, similar in 
philosophy to the entry of R2012 submitted to the SNPCC. 
This class of statistical tools has already been applied to a 
variety of astronomical topics (for a recent review see |Ball| 
fc Brunner| ( |2010[ )). 

We propose the use of Kernel Principal Component 
Analysis (hereafter, KPCA) as a tool to find a suitable low 
dimension representation of SNe light curves. In construct- 
ing this low dimensional space only the spectroscopically 
confirmed sample is used. Each unlabelled light curve is 
then projected into this space one at a time and a k-nearest 
neighbour (kNN) algorithm performs the classification. The 
procedure was applied to the post-SNPCC data set using 



1 http://sdssdp62.fnal.gov/sdsssn/SNANA-PUBLIC/ 

2 http: / /sdssdp62.fnal.gov/ sdsssn/SIMGENJ>UB LIC/ 



Nomenclature taken from Newling et al 



(20111 
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the entire light curves and also using only pre-maximum ob- 
servations. In order to allow a more direct comparison with 
SNPCC results, we also applied the algorithm to the com- 
plete light curves in the original SNPCC data selQ 

Our procedure returns purity levels higher than to top 
ranked methods reported in the SNPCC. The results are 
sensitive to the spectroscopic sample, but more on the qual- 
ity of each individual observation than on representative- 
ness between spectroscopic and photometric samples. As- 
suming that results can only be as good as the input data, we 
perform classification in sub-samples of SNPCC and post- 
SNPCC data based on signal to noise ratio (SNR) levels. 
Considering only light curves with Multi-color Light 



Curve Shape (MLCS2k2) ( |Jha et al.||2007] ) fit probability, 
FitProb>0.1, we demonstrate that our method is capable of 
increasing purity and successful classification rates even in 
a context with only light curves very similar between each 
other. 

The paper is organized as follows: section [2] briefly de- 
scribe linear PCA and its transition to the KPCA formal- 
ism. In section [3] we detailed the cross-validation and kNN 
algorithm used for classification. In section[4]we present the 
guidelines to prepare the raw light curve data into a data 
vector suitable for KPCA. The results applied to a best case 
scenario simulation, to the post-SNPCC data and compari- 
son with MLCS2k2 fit probability results are shown in sec- 
tion [5] We report outcomes from applying our method to 
the original SNPCC data set in section|6] Finally, we discuss 
the results and future perspectives in section [7| Throughout 
the text, mainly in section [2] we refer to a few theorems 
and mathematical statements are made. Those which are 
most crucial for the development of the KPCA argument 
are briefly demonstrated in appendix [A] A detailed descrip- 
tion of results achieve using linear PCA+kNN algorithm is 
presented in appendix [B] Appendix [C] shows classification 
rates as a function of redshift and SNR cuts and appendix 
[D] displays our achievements when no SNR cuts are applied. 
Graphical representation of results from SNPCC data set 
for all the tests we performed, which can be directly com- 
pared to those of |Kessler et al.| ( |2010[ ) are displayed in ap- 
pendix [E] Complete summary tables reporting the number 
of data points in different sub-samples of SNPCC and post- 
SNPCC data and classification results mentioned in the text 
are shown in appendix [F] 



2 PRINCIPAL COMPONENT ANALYSIS 

The main goal of PCA is to reduce an initial large num- 
ber of variables to a smaller set of uncorrelated ones, called 
Principal Components (PCs). This set of PCs is capable of 
reproducing as much variance from the original variables as 
possible. Each of them can be viewed as a composite variable 
summarizing the original ones, and its eigenvalue indicates 
how successful this summary is. If all variables are highly 
correlated, one single PC is sufficient to describe the data. 
If the variables form two or more sets, and correlations are 
high within sets and low between sets, a second or third PC 



4 http:/ /www. hep. anl.gov/SNchallcnge/ 
DES_BLINDnoHOSTZ.tar.gz 



is needed to summarize the initial variables. PCA solutions 
with more than one PC are referred to as multi-dimensional 
solutions. In such cases, the PCs are ordered according to 
their eigenvalues. The first component is associated with the 
largest eigenvalue, and accounts for most of the variance, the 
second accounts for as much as possible of the remaining 
variance, and so on. 

There are a few different ways which lead to the deter- 
mination of PCs. Particularly, we have already shown that 
it is possible to derive the PCs beginning from a theoreti- 
cal description of the likelihood function (e.g 
Souza|201l||Ishida et al.|2011 l 



Ishida & de 



In the present work we are interested in exploring the 
KPCA and, as a consequence, our description shall be based 
on dot products. In doing so, the connection between PCA 
and KPCA occurs almost smoothly. We follow closely [Hof-| 
|mann et al.| ( |2008| and Max Welling's notes A first encounter 
with Machine Learning which the reader is refereed to for a 
more complete mathematical description of the steps shown 
here. 



2.1 Linear PCA 

We begin by defining a set of iV vectors G = {gi, g2, Sn}, 
which contains our observational measurements. If gmean is 
the vector of mean values of G, let X £ R n be the set of 
vectors which holds the centered observations, 



Xfc = gk - gn 



(1) 



In order to find the PCs, we shall diagonalize the co- 
variance matrijOU 



1 N 



(2) 



This can be accomplished by solving the eigenvalue equation 

A„v 4 = CVj, (3) 

where A; > are the eigenvalues and v« G K n the eigenvec- 
tors of the covariance matrix. 

If we consider V the set of eigenvectors of C and P the 
set of data points projections in V , the elements of P will 
be given by 

p; = Afx ly (4) 
where Ai is the matrix formed by the I first PCs as columns. 



5 http://www.ics.uci.edu/~welling/teaching/ICS273Afallll /In- 
troMLBook.pdf 

6 The covariance matrix is traditionally defined as the expec- 
tation value of x T x. For convenience, we shall address the 
term covariance matrix to the maximum likelihood estimate 
of the covariance matrix for a finite sample, given by equation 
pj ^Scholkopf et al.|1996 i. 

Tt is also possible to apply PCA to a correlation matrix. This 
is advised mainly when the data matrix is composed by measure- 
ments with different orders of magnitude and/or units. Since in 
our particular case all measurements are in the same units (fluxes) 
and normalized in advance, we shall use the covariance matrix. 
For a detail discussion on the pros and cons of each case, see 
Jollifel J2002I) - section 2.3. 
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It is possible to show that the elements of P will be uncor- 
related, independently of the dimension chosen for matrix 
Ai. 

This is where the dimensionality reduction takes place. 
We can choose the number of PCs that will compose the 
matrix Ai based on how much of the initial variance we are 
willing to reproduce in P. At the same time, depending on 
the nature of our data, the spread of the points in the PCs 
space might also reveal some underlying information, as the 
existence of two classes of data points, for example. 

The main goal of this work is to use PCA to project 
the data in a sub-space where photometric data vectors as- 
sociated with different supernova types can be separated. In 
order to do so, our first step is to show that it is possible 
to calculate the projected data points £ P without the need 
of explicitly defining the eigenvectors € V. This will be im- 
portant when we consider non-linear correlations in the next 
sub-section. 

Given that all vectors £ V must lie in the space spanned 
by the data vectors g X, we can show that (see appendix 



with 



T 

Xi °V a . . 

OLi = ^TTT , (5) 



NX a 



and as a consequence, instead of solving equation Q we 
can also find the elements of P by solving the projected 
equations*] 



xf Cv a = A a xf v a , Mi, a. 



(6) 



This leads us to an eigenvalue equation in the form 

Ka a = X a a a , (7) 

where 



(8) 



and A a = NX a . Normalizing v a , we can also show that 
|a a || = l/-/N\~ a . 

Finally, consider a test data vector n. Its projections in 
the PCs space are given by 



vjn = ^2 a"xf n = ^ a^K (x*, n), 



(9) 



where K(xt, n) = xfn. 

This demonstration was specifically designed to rely 
only on the matrix K. Although, the classification we aim in 
this work is not possible in the linear regime. In order to be 
able to disentangle light curves from different supernovae, 
we need to perform PCA in a higher dimensional space, 
where the characteristics we are interested in are linearly 
correlated. 



2.2 Kernel Principal Component Analysis 

KPCA generalizes PCA by first mapping the data non- 
linearly into a higher dimensional dot product space F (here- 



after, feature space) 

$ : R n ->■ F 

x — > $(x), 



(10) 



where $ is a nonlinear function and F has arbitrary (usually 
very large) dimensionality. 

The covariance matrix, Cf £ F, will be defined similarly 



1 



(ii) 



We assume that <&(xi) are centred in feature space. We shall 
come back to this point latter on. 

Consider v s the I — th eigenvector of Cf and Aj, its 
I — th eigenvalue. Using the same line of argument shown 
in the previous subsection, we can define an kernel N x N 
matrix 



J^(x l ,x J ) = ($(x I )-$(x J )), 



(12) 



which allows us to compute the value of dot product in F 
without having to carry out the map <3>. The kernel func- 
tion has to satisfy the Mercer's theorem to ensure that it is 
possible to construct a mapping into a space where Kf acts 
as a dot produclQ The projection of a new test point, n, is 
given by 



(vi-$(n)) =^c4 i JriKx 1 ,n), 



(13) 



where a $ . is defined by the solutions to the eigenvalue equa- 
tion N\3>a<s> = KfOlz,. 

Finally, it is important to stress that all the arguments 
shown in this sub-section rely on the assumption that the 
data are centred in feature space. This is not a direct conse- 
quence of using X instead of G. Equation |l| is responsible 
for centring data vectors in R n , in order to perform central- 
ization in F, we need to construct the kernel matrix using 
<E>(x) — $(a;). This can also be computed without any infor- 
mation about the function J^Jt is shown in appendix |A"]that 
the centred kernel matrix, Kf, can be expressed in terms of 
the non-centered kernel matrix, Kf, as 



K F = K F ~ 1nK f -K f 1n + 1m K F lf 



(14) 



where (ljv)ij = 1/N. The reader should be aware that we 
always refer to the centred kernel matrix Kf- However, for 
the sake of simplicity, the tilde is not used in our notation. 

At this point, we have the tools necessary to compute 
the centred kernel matrix based on dot products in input 
space. However, we still need to choose a form for the kernel 
function k(x.i,x.j) := Kfi • 

In the present work, for the sake of simplicity, we make 
an a priori choice of using a Gaussian kernel, 



exp 



x, 



2a 2 



(15) 



where the value of a is determined by a cross-validation 



processes (see subsection 3.2 1. Although, it is important to 



8 Equation [fj] results from writing each eigenvector as a linear 
combination of the data vectors. 



9 http: / / ni.cs.tu-berlin.de/lehre/mi- 
materials/Mercer _theorem.pdf 
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emphasize that there is extensive literature on how to choose 
the appropriate kernel for each particular data set at hand 



(Lanckriet et al. 2004 Zang et al. 20061. To compare the 



analysis between different kernel choices is out of the scope 
of this work. As our goal is to focus on the KPCA procedure 
itself, we are using the standard kernel choice. An analysis of 
performances from different kernel choices within the KPCA 
framework should certain be topic of future research. 



3 CLASSIFICATION 

By virtue of what was presented so far, we have a set of 
centred data points, X, and a kernel function, fc(xi, Xj). This 
allows us to calculate the kernel matrix in feature space, 
Kf, and its corresponding eigenvalues, a<s>. Using equation 
j9|, we can obtain the projection of each data point in the 
eigenvectors of Cf . 

From now on, we will work in the space spanned by 
these eigenvectors. More precisely, we will look for a 2- 
dimensional sub-space of vj> , which can optimize our ability 
to separate the projected data in 2 different classes (namely 
la and non-la supernovae). We chose to keep this sub-space 
bi-dimensional in order to avoid over-fitting to the particular 
data set we are analysing. 

The procedure describe before is now applied to two dif- 
ferent instances of our data. A data set suitable for the anal- 
ysis we present here must be composed of two sub-samples. 
For one of them we have the appropriate label for each data 
point (we know which class they belong to), from now on 
this sub-set will be called training sample. For the other 
sub-sample (hereafter test sample) the labels are not avail- 
able, and we want to classify them based on our previous 
knowledge about the training sample. 

In a first moment, we will concentrate our efforts in the 
training sample. Its projections in a certain pair of PCs are 
calculated through equation j9J. Given that labels of data 
in this sample are known, we can calculate projections in 
different PCs and determine which PC pair better translates 
the initial light curves into a separable point configuration. 



3.1 The k-Nearest Neighbor algorithm 

Our choice of which subspace of v$ is more adequate for a 
specific data situation will be balanced by how well we can 
classify the training sample using the k-Nearest Neighbor 
algorithm (kNN). 

kNN is one of the most simple classification algorithms 
and it has been proved efficient in low dimension param- 
eter spaces, (dim ^ 10, for a further discussion on kNN 
performance in higher dimensions see |Beyer et al.| ( |1999[ |). 
The method begins with the training sample organized as 
qi = (xi,yi), where Xi is the i — th data vector and yi its 
label, and a definition of distance between 2 data vectors 
d(xi,Xj). Given a new unlabelled test point qt(xt, ), the al- 
gorithm computes the distance between x t and all the other 
points in the training sample, d(a;t,x), ordering them from 
lower to higher distance. The labels of the first k data vectors 
(the ones closer to x t ) are counted as votes in the definition 
of yt- Finally, yt is set as the label with highest number 
of votes. Given this last voting characteristic, kNN is many 



times refereed to as a type of majority vote classifier ( James 
p98] ). 

Throughout our analysis, we used an Euclidean distance 
metric and order k = 1. As this is the first attempt in ap- 
plying KPCA to the photometric problem, we chose to be 
bounded by the Bayes error rate (hereafter, BER). The BER 
is defined as the error rate resulting from the best possible 
classifier. It can be shown that, in the limit of large sam- 
ples, the error rate of a k = 1 nearest neighbour algorithm 
is never larger than 2x BER (for a scratch of the proof see 
Ripley ( 1996 I, page 195). From now on, this will be refereed 
to as INN algorithm (nearest neighbour with k = 1). 

So far we described how to define a convenient 2- 
dimensional space where our data points will be separated 



in la and non-la populations (sub-section 2.2 \ and a classifi- 



cation tool that allows us to add a label to a new, unlabelled 



data point (subsection 3.1 1. However, we still need to define 
which pair of PCs of the feature space better maps our data. 
This is done in the next sub-section. 



3.2 Cross-validation 

The main idea behind the cross-validation procedure is to 
remove from the training sample a random set of M data 
points, T° ut . The remaining part of the training sample is 
given as input in some classifier algorithm and used to clas- 
sify the points in T out . In this way, we can measure the 
success rate of the classifier over different random choices of 
T out and also compare results from different classifiers given 
the same training and T° ut sets (for a complete review on 



cross-validation methods see Arlot & Celisse (20101). 

The the number of points in T om is a free parameter and 
must be defined based on the clustering characteristics of 
the given data set. Here we chose the most classical exhaus- 
tive data splitting procedure, sometimes called Leave One 
Out (LOO) algorithm. As the name states, we construct N 
sub-samples T out , each one containing only one data point, 
M = 1. The training sample is then cross- validated and the 
performance judged by the average number of correct clas- 
sifications. 

Data exhaustive algorithms like LOO have a larger vari- 
ance in the final results, although, they are highly recom- 
mended for avoiding biases regarding local data cluster- 
ing and some non-uniform geometrical distribution of data 
points in a given parameter spac^ 

3.2.1 The algorithm 

In the context of KPCA, we used LOO and INN algorithms 
to decide the appropriate pair of PCs and value of a (equa- 
tion ( 15 1) for each data set. 



The next trick question to answer is: which PCs we 
should test with the algorithms described before? Obviously 
there is a high number of vectors in v$ and it would not be 
possible to test all available pairs. Fortunately, we can make 
use of the fact that the firsts eigenvectors v$ (those with 
larger eigenvalues) represent directions of greater data vari- 
ance in feature space. Although we cannot visualize such 



10 http:/ /www. public. asu.edu/>~ltang9/papors/ency-cross- 
validation.pdf 
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Figure 1. Normalized light curves from SIM1. Left: SNe la light 
curve. The plot shows the flux measurements (blue dots) and fit- 
ted spline function (red curve), normalized as explained in the 
text. Right: Example of normalized light curves functions for la 
(red thick), lb (green dashed), Ibc (orange short-dashed), Ic (cyan 
dashed), IIL (blue thin), Iln (brown short-dashed) and IIP (pur- 
ple dot-dashed), according to SNANA classification. The panels 
from top to bottom run over the DES filters {g, r, i, z}. The hor- 
izontal axis is in units of days since maximum brightness in r 
band. 



vectors, it is easy to confirm that the magnitude of data 
points projections in v$ become very similar to each other 
for higher I. In other words, the smaller eigenvalues corre- 
spond to PCs carrying mostly noise, so their projections will, 



in average, be very similar, and meaningless ( Scholkopf et al 



19961. For classification purposes, one expects that the PC 
pair tailored to provide geometrical separation of the data 
projection into classes will be among the PCs with higher 
eigenvalues. 

For the case studied here, we restrict ourselves to test- 
ing the first 5 PCs in a first round and extend the search to 
other PCs only if the classification success rate do not mono- 
tonically decrease with the use of higher PCs. In the same 
line of thought, we start our search with a £ {0.1, 2.0} in a 
grid with steps of 0.1 and make this interval wider only if 
the results do not converge after a first round of evaluations. 

The cross-validation algorithm we used is better sum- 
marized as: 

(i) Pick a PC pair, {PC A ,PC B }. 

(ii) Define a grid of values for parameter a, a £ 

{<^min, t^max}- 

(iii) Pick a value from the above grid, otest- 



Table 1. Description of the light curve selection cuts. The SNe 
were required at least one observation in t ^ ti ow , one in t ^ t up 
and at least 3 observations satisfying a given SNR requirement in 
each filter in order to be included in any of the data sets analysed 
in this work. These selection cuts were applied for training and 
test samples within a specific data set. 
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(iv) Cross validate the training sample using the KCPA 
projections in the chosen PCs, INN and LOO algorithms. 

(v) Calculate the average classification success rate for 
{a tC s t ,PCA,PC B }. 

(vi) Repeat steps (ii) to (v) 10 times. If the average num- 
ber of successful classifications monotonically decreases in 
the upper and lower boundaries of a, go to step (vii). If not, 
repeat steps (ii) to (vi) until they do. 

(vii) Repeat steps (i) to (vi) for all pairs of {A, B} £ 
{1,5}. 

(viii) If the average number of successful classifications 
monotonically decreases when using higher PCs, go to step 
(ix). Otherwise, consider {A, B} £ {1, 10} and repeat steps 
(i) to (viii). 

(ix) Choose for {a, PCa, PCb}, values corresponding to 
the largest average number of successful classifications. 

Once the cross-validation is completed, we use the re- 
sulting parameter values to calculate the training sample 
projections in PC space. We can finally use INN algorithm 
to assign a label to each data point in the test sample. 
The final procedure of classifying the test sample is called 
KPCA+1NN algorithm throughout the text. 

The framework described so far can be applied to any 
set of astrophysical objects, as long as we have a training 
and a test sample. The cross-validation procedure is per- 
formed only in the training sample and each point in the 
test sample is classified at a time. This avoids running the 
whole machinery again every time one new point is added to 
the test sample, and prevent us from introducing mislead- 
ing data as part of the features to be mapped by the PCs. 
However, the parameter space composed by the PC pair and 
value of a can always be updated if we have at hand new 
data points whose types are known. Only then it is necessary 
to re-run the cross-validation process. 

From now on we focus on the problem of photometri- 
cally classifying SNe la as a practical example, although the 
exact same steps could be applied for any transient with ob- 
servable light curves. In the next section, we describe how 
the light curve data should be prepare before we try to clas- 
sify them. 



4 LIGHT CURVE PREPARATION 

In case we have b different filters, the observational 
data available from the I — th SN can be arranged as 
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Figure 2. Classification results from SIM1. Blue circles (la) and 
purple squares (non-la) represent the geometrical locus defined 
by the training sample. Top: Red dots correspond to SNe la in 
the test sample. Bottom: Cyan dots correspond to non-la SNe 
in the test sample. The plot also shows calculated values for eff^ 
and pur. 



F' = {Fl,...,Fl}. Considering the i - th filter, (F ; ). = 
{{t L il ,F^,a l Fil },...,{t{ e ,Fl,a l Fte }}. In our notation, the tL 
correspond to the j — th observation epoch (in MJD), F\j 

(Tjrjj is the error in flux mea- 



is the measured flux at t\ 
surement and e is the total number of observation epochs in 
filter i. 

Our next task is to translate the time of each obser- 
vation from MJD to the time since maximum brightness in 
a particular filter. Which filter shall be used as a reference 
does not have much influence in the final result. The ideal 
is to choose a band where the ability to determine the time 
of peak brightness is greater, and use that reference band 
for all SN in the sample. The time of maximum brightness 
in our reference band for the I — th SN is addressed as t max . 
As a result, we obtain data points in a particular filter % as 



F\ = {{{t[ 

(A 



max. 
I 



t 



) ;1 ! ^il! a iFl}> 

t l 



, { (4ax) le , Fj e , ° l iF C }} , where 



We must also deal with the fact that, in a real situation, 
the input from observations consists in some non-uniform 
sampling of the light curve in various (most cases more than 
3) different filters for each SNe. Although, it is necessary to 
translate such information into a grid equally spaced in time. 
This is done by using a cubic regression spline fit for each 
light curve. The spline fit was chosen based on its ability to 
fit non-uniform functions in a parameter independent man- 
ner. As a consequence, we have a smooth light curve function 
for each SNe and filter. 

As a final step, we must keep the light curve functions 
within a reasonable range (so to avoid divergence in the ex- 
ponent of equation ( |15[ ) due to very bright or dim sources, for 
example). This is done through the normalization of the light 



Table 2. Mean values and standard deviations of residual be- 
tween the simulated and derived date of peak brightness in each 
band. The values were obtained through analysis of SNe la light 
curves in the training samples of SIM1 and SNPCC. 





SIM1 


SNPCC 


filter 


At™ ± <7 At max 


Ai max ± aAtIn ^ 
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-3.7 ±3.5 


1.2 ±27.1 


r 


-0.1 ± 2.6 


0.9 ± 8.2 


i 


1.9 ± 2.8 


2.3 ± 9.2 


z 


1.2 ± 3.6 


3.4 ± 8.4 



curve functions by the maximum flux measured in all filters 
for a particular SN. In our notation Sjv;(£) corresponds to 
the normalized fitted light curve for the I — th SN in filter 
i. The use of the same normalization factor for all filters for 
a given SN ensures that the colour and shape of each light 
curves are preserved. 

We now use the Sn' = {SWi, Sjv;,} functions in or- 
der to construct our initial data matrix, G, composed by 
N rows and M columns. Each row contains all information 
available for a single SN and each column contains the flux 
measurements in a specific observation epoch and filter. The 
difference in time since maximum brightness between 2 suc- 
cessive columns of G is defined as A and for the purposes 
of this work it is kept constant. However, we do address the 
analysis with different values for A later on. The lowest and 
highest observation epoch since i^ax is referred to as ti ow 
and t up , respectively. 

Throughout this work, we took the conservative ap- 
proach of not extrapolating functions SjVi(t) outside the 
time domain covered by the data. In other words, we only 
considered classifiable those SN which have at least one ob- 
servation epoch t ^ tiow and at least one epoch t ^ t up , in 
all available filters. The values of t\ ov/ and t up must be cho- 
sen so to include the largest possible number of SNe and, at 
the same time, to probe an interval of the light curve which 
posses information enough to satisfy our classification pur- 
poses. We applied the algorithm considering values of tiow 
and t up shown in table [l] The demand that this sampling 
must be fulfilled in all filters could be relaxed, leading to 
an interesting study about the importance and role of each 
frequency band. We leave that for a future work, focusing 
our efforts in data points for which information is available 
in all bands. 

Joining the previous ingredients, light curves from the 
I — th SN sampled between ti ow and t up in steps of length A 
are stored in a single row of G, sequentially for b different 
filters. We can now use equations |l]) and ( 15 I to calculate 
the centred data vectors and kernel matrix, respectively. 



5 APPLICATION 
5.1 Data sets 

We applied the procedure described so far to different sam- 
ples taken from the post-SNPCC data set. The post-SNPCC 
consisted of ~20.000 SNe light curves, simulated according 
to DES specifications and using the SNANA light curve sim- 
ulator. This large set is subdivided in 2 sub-samples: a small 
spectroscopically confirmed one of 1103 light-curves (train- 
ing) and a photometric sample of 20216 light curves (test). 
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Figure 3. Classification results from post-SNPCC, Di+SNR5 
data set. The training sample is represented by the blue circles 
(la), purple squares (non-la) and pink diamonds (untyped). Top: 
SNe la from the test sample (red dots) are superimposed to the 
complete training set divided in la and non-la. Middle: Non-la 
SNe test sample (yellow dots) are superimposed to the complete 
training set as in the upper panel. Bottom: Training set points 
including U as a possible classification type. 



The role of the training sample was to mimic, in SNe types, 
proportions and data quality, a spectroscopically confirmed 
subset available for a survey like DES. After the challenge 
results were released, the organizers made public an updated 
version of the simulated data set (post-SNPCC), which was 
used in most of this work. This updated data set is quite dif- 
ferent from the one used in the challenge itself (SNPCC), due 
to a few bug fixes and other improvements aimed to a more 
realistic representation of the data expected for DES. As a 
consequence, its results should not be compared to those of 
the SNPCC. A detailed analysis of our findings from the 
post-SNPCC faced to others published after the challenge, 
which use the same data set (namely, |Newling et al.| |201 1 [ ), 
R2012 and Karpenka et al. ( 2012 1 ) , is presented in section 



m 

For the sake of completeness, we also present results 
from applying our method to the SNPCC sample. Although 
this sample contains the bugs mentioned before, it allow us 





Figure 4. Results from the post-SNPCC data for pur (left), effe 
(middle) and FoMb (right) as a function of redshift for Di+SNR5 
(alternative view of results shown in figure 13} . The red-thick lines 
correspond to results found for the test sample (cross-validated) 
and blue-thick lines show results for the training sample. The right 
panel also shows values for ppur (thin lines, blue for training and 
red for test sample). These results were calculated for redshift 
bins of width 0.2. Redshift dependent outcomes from SC, effA 
and pur a. for this sample are shown in figure |C2] 



to coherently compare our method to a broader range of 
alternatives. Detailed comparison of our results with those 
reported in Kessler et al. (20101 is presented in section [6] 

Our first move is to check if KPCA can correctly classify 
SNe light curves in a best-case scenario. In order to do so, 
we generated a high quality data set, hereafter SIM1. This 
set consists of 2206 SNe, composed by 2 sub-samples (train- 
ing and test), both with at least 3 observation epochs having 
SNRJS5 in all filters. SNe types and proportions in each sub- 
set are the same as those found in the post-SNPCC training 
sample. As a consequence, the 2 sub-samples in SIM1 are 
completely representative of one another. This was done to 
avoid classification problems found by other studies when 
the training sample is not representative of the test sample 
(e.g., |Newling et al.| (|2011[) and R2012). At this moment, 



the purpose of SIM1 is only to perform a consistency check 
for the KCPA and light curve preparation prescriptions de- 
scribed above. 

In generating SIM1, we used the input SNANA files 
provided as part of the post-SNPCC package, and ran the 
simulator until the required number of each SNe type pass- 
ing selection cuts was reached. The kernel matrix was con- 
structed considering tf™ 1 = -3 and t^ M1 = +24. After 
verifying that our algorithm was indeed effective in ideal 
conditions, we will focus on the analysis of the post-SNPCC 
itself. 

The Phillips relation for type la SN can be consider the 
first SNe la standardization procedure ( Phillips|1993 1. It es- 
tablishes a correlation between the magnitude measured at 
maximum brightness and the magnitude measured 15 days 
after that (hereafter Phillips interval). For our purposes, this 
relation highlights a time interval in the light curve where 
important information are stored. However, at this point we 
cannot say if a data set sampled solely in this time inter- 
val can provide enough information. As a consequence, we 
considered 8 different sub-sets of post-SNPCC data, whose 
parameters are described in Table[TJ This requirements were 
imposed to training and test samples within a given data set. 

D\ to Di probe the light curve so to include the Phillips 
interval. D5 and D§ aim at testing the KPCA+1NN proce- 
dure in a region of the light curve that was not explored in 
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the SNPCC: with points only before maximum. Although 
this kind of classification does not result in cosmological 
useful SNe la, it is very important in pointing candidates 
for spectroscopic follow-up ( jKessler et al.|2010[ ). D7 and D$ 
are tailored to include the second maxima in infra-red bands 
expected to occur after 20 days since maximum brightness 
( Kasen||2006| ). 

In Table [l] we varied not only the maximum and mini- 
mum epoch of observation, but also considered different val- 
ues for A. The purpose of this analysis is to investigate if 
the classification results are sensitive to the step size between 
different columns of the kernel matrix. We expect this result 
to be correlated with data quality, since the interpolated 
functions are influenced by errors in flux measurements. To 
test this hypothesis, we applied the classification procedure 
to different sub-samples of each data set, according to their 
SNR. 

Finally, we only considered SNe with at least 3 observa- 
tional epochs above a certain SNR threshold in each filter. 
As the spline fitted functions are supposed to get the overall 
behaviour of a smooth light curve, this selection cut assures 
that at least 3 of the points with higher weights in the spline 
fitting procedure correspond to good quality measurements. 
We also present results without a SNR selection cut, ad- 
dressed as SNR>0. 



5.2 Results 

In order to choose a filter as our reference band, we used the 
SNe la in the training sample of SIM1 and post-SNPCC. As 
our primary goal is to correctly separate a sample contain- 
ing only type la, our decision was based on the results from 
SNe la in the spectroscopic sample only. Interpolated light 
curve functions before normalization were used to determine 
the time of peak brightness in all bands. The residual be- 
tween the simulated and derived date of maximum bright- 
ness, At max , in each band were computed for all SNe la 
in the training samples. This resulted in a distribution of 
points whose spread represents our ability (or lack of) in 
determining this parameter for each filter. The mean values 
and standard deviations encountered are shown in Table [2] 

From this we realized that the r band is the best choice 
for determining the time of peak brightness, since it has the 
less biased mean value with the smallest standard deviation. 
Such results agree with those found in R2012, also based on 
SNANA simulations, but with a different argument. All the 
results presented from now on were calculated using the time 
of peak brightness in r band as reference. 

The final classification results are reported in terms 
of efficiency (eff), purity (pur) and successful classification 
(SC) rates, 



eff = 



pur 



N tot 



SC 



N wc +N sc 

noma 1 la 

N sc , N sc 

J "a ' J 'noma 

^y-TOT 



(16) 
(17) 
(18) 



where iVf^ c (iV^onla) is the number of successfully classified 
SNe la (nonla), Ni° % is the total number of SNe la, A^Sa 
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Figure 5. Summary of classification results. Panels display pur, 
effA, SC, FoMa and final sample size, from top to bottom. Hor- 
izontal axis runs through data samples described in table [l] Re- 
sults are displayed for SNR^5 (red circles), SNR^3 (blue squares) 
and SNR^ (green diamonds). 



is the number of non-la wrongly classified as la and N TOrT 
is the total number of SNe which survived selection cuts. 

Efficiency values are shown for two different normaliza- 
tions: effs considers Ni° % the total number of SNe la before 
any selection cuts, and effA was calculated using the total 
number of SNe la remaining after selection cutsr^j The def- 
inition used in the SNPCC corresponds to effe , and aims at 
addressing the impact on final sample not only due to the 
classifier, but also to the selection cuts used. In our particu- 
lar case, we chose to display values of effA in order to isolate 
the classification power of the algorithm itself. As stated be- 
fore, our results are mainly influenced by the quality of each 



11 By selection cuts we mean the SNR requirement for each subs- 
sample + the time window constraints of described in tabic [T] 
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Figure 6. Number of SNe as a function of their fit probability calculated from MLCS2k2. Panels show histograms for SNR^5, SNR^ 3, 
SNR^O and SNANA cuts, from left to right. Also shown are the classification outcomes based on FitProb (SNe with FitProb>0.1 were 
tagged as la and the remaining ones were tagged as non-la). 



observation. Beyond that, we made a specific choice of not 
extrapolating the light curve where data is not present (ta- 
ble As a consequence, we consider our selection cuts as 
a minimum amount of information necessary to coherently 
compare different light curves without the need of further ad 
hoc hypothesis. In this scenario, the use of eff a gives a better 
idea on the classifier performance. However, when compar- 
ing with previous analysis from the literature, effs should 
be referred to. From now on, for all our results that can be 
compared to previous ones, both quantities are shown. FJ 

By definition, eff measures our capacity in recognizing 
the SNe la, while pur measures the contamination from non- 
la SNe in our final sample. SC values are presented in order 
to provide an overall picture of our classification results re- 
garding non-la as well. 

In order to make our results easier to compare with 
other analysis from the literature, we also report them in 
terms of the figure of merit (FoM) and pseudo-purity (ppur), 
used to rank classifiers in the SNPCC, 



ppur 



N sc + WN wc ' 

la nonla 

FoM = eff x ppur, 



(19) 
(20) 



where W is used to input a stronger penalty on non-la con- 
taminating the final SNe la sample. Following the SNPCC, 
we used W = 3. Given that FoM is a function of efficiency, 
we report values for FoMa and FoMb for total number of 
SNe after and before selection cuts, respectively. 

5.2.1 SIM1 

We must now prepare the light curves according the pre- 
scription described in section [4] We randomly selected one 
example of type la light curve in SIM1 to illustrate how the 
fitted functions behave given the data points. This is shown 
in the left panels of figure [I] The right panels show the 
light curve functions for different types of non-la SNe. Pan- 
els from top to bottom run over the DES filters {g, r, i, z}. In 
order to facilitate visualization, all curves were normalized 
as explained in section [4] 

For the SIM1 data set, the cross-validation procedure 



returns PCs 1 and 5 along with a — 0.3 as the most appro- 
priate parameters values. The final geometrical distribution 
of the training sample in such PCs parameter space, along 
with the classification results are shown in figure [2] In order 
to facilitate visualization, we show the la and non-la SNe in 
the test sample in two different plots. 

We can see that, in a best case scenario, KPCA+1NN 
algorithm is efficient enough to separate the two populations 
in feature space with a minimum loss in the number of SNe 
la (up to 94% efLt) and almost no contamination from non- 
la's in the final sample (up to 99% pur). 



5.2.2 Post-SNPCC data 

The analysis of the post-SNPCC data was performed in dif- 
ferent steps. We first separate a sub-sample which can be 
consider the analogous of SIM1 inside post-SNPCC, D\ with 
SNR^5 (hereafter £>i+SNR5). This data set results from 
imposing in post-SNPCC data the same selection cuts ap- 
plied to SIM1. 

Using Di+SNR.5, we obtained 89% (80%) pur and SC 
of 92% (94%) in the training (test) sample. The graphical 
representation of results from Di+SNR5 are shown in the 
upper and middle panels of Figure [3] and the redshift distri- 
bution of the diagnostic parameters are displayed in figures 
l4landlC2l 

Analysing the geometrical distribution of training 
sample data points (blue circles and purple squares), the 
numerical results mentioned above become more clear. 
There is an obvious distinction between the preferential 
locus occupied by la and non-la in this parameter space. 
However, besides the overlapping area where both species 
exist, and which was already present in SIM1, we can also 
spot some contamination of non-la points inside the area 
occupied by la. Such "misplaced" non-la probably gave 
rise to an important share of the wrong cross-validation 
classification. In what follows, we described 2 different 
approaches aimed at suppressing the influence of these 
"problematic" data points. 

The Untyped supernova 



12 For the sake of clarity, when both quantities are present (re- 
sults that might be compared with others from the literature), 
outcomes normalized after selection cuts are shown in appendixes 
[C]and[E] 



Let us focus in Z?i+SNR5 for a moment. Each data 
point in the training sample is characterized by the SN iden- 
tification number, its coordinates in PCI x PC4 space, the 
true label and the label from cross-validation. We identified 
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Figure 7. Classification results obtained for the sub-sample 
of SNe with FitProb>0.1 using different time windows. Red- 
circles, blue-squares, green-diamonds and gray-triangles corre- 
spond to KPCA+1NN results when SNR^ 5, SNR> 3, SNR^O 
and SNANA cuts are applied, respectively. Horizontal red (dot- 
ted), blue (dashed), green (dot-dashed) and gray (full) lines cor- 
respond to the results from FitProb criteria for the same set of 
cuts. Panels show effA, pur, FoMa, SC and the percentage of SNe 
la passing time window requirements from top to bottom. 



all points who received a wrong label in the cross-validation 
process and gathered them in a set U. We considered these 
troubled points, in the sense that, although they are spec- 
troscopically confirmed SNe, their light curve characteristics 
are not enough to fully distinguish them within the training 
sample. 

Our first attempt was to remove all points 6 U from 
the training set before classifying the test sample. In doing 
so, we defined that a new unlabelled test point would be 
classified according to the region in the parameter space it 
occupies, since removing the troubled points defines a clear 
geometrical boundary between la and non-la regions in PCs 
parameter space. This slightly increased our ratings, leading 
to 87% pur, 93% eff A , and 96% SC rates. 

Trying to get rid of the remaining contamination as 



much as possible, we consider the complete training sam- 
ple with 3 different SNe types: la, non-la and untyped SNe 
(U). This allows us to take advantage of the information 
in the troubled points and identify light curves similar to 
them. An expected consequence of this choice is a decrease 
in efficiency, since some of the la in the test sample will be 
classified as U. On the other hand, as the lost of SNe for 
the U class happens to non-la as well, the pur in our final 
la sample will increase, for Di+SNRJS 5 to 91%. 

The training set divided in 3 sub-samples has its graphi- 
cal representation shown in the bottom panel of figure[3] For 
all the cases described here (complete training, excluding U 
from the training set and including U as a classification type) 
the distribution of test points will not change, since only the 
training sample is affected. 

We performed the classification for all samples de- 
scribed in table [T] imposing 3 different SNR cuts (namely 
SNR^5, SNR^3 and SNR^O). A summary of our finding is 
detailed in table IF2I 

Figure [5] shows results for samples listed in the above 
mentioned table for the case where the U class was included 
in the training sample as a third SNe typj^] It is clear from 
this plot that pur and FoM results become more dependent 
on time sampling choices as SNR goes higher. The extreme 
cases being samples D^/Dq (before maximum, worst results) 
and Dt/Ds (wider time sampling, better results). 

Finally, we should emphasize that our analysis was 
based on the idea that information should be stored some- 
where in the light curve function. If this is true, KPCA could 
easily be able to provide a direction of information clustering 
in some untouched feature space, which could be accessed 
through the data points projections in the PCs. That was 
the main reason why we started our analysis based on SNR 
requirements. Errors in flux measurements are direct cor- 
related to the SNR of each observation, and higher errors 
lead to more oscillations in the light curve functions. In the 
extreme case were we used random number as components 
of an input data vector (which contains no information), its 
projections in PCs will always be located very close to the 
origin. 

Results shown in Table lF2l reflects this main idea. Re- 
quiring a SNRJS5 in D\ to D4, we obtained pur, effA, and SC 
rates higher than 80% in all 4 cases. These samples contain 
approximately 5 times more non-la than la SNe (see Table 
F3 1 , which is close to what we expect in a real survey. Be- 
yond that, we did not demand representativeness in redshift 
or SNe types between the test and training samples. The 
training sample inside the post-SNPCC data have all the 
biases the organizers were able to predict and which come 



along with spectroscopic observational conditions (Kessler 
et al-pOlO l. The selection cuts we applied to SNR, in this 
context, can be seen as a simple procedure to extract the 
full potential of a given data se1p^| 

The results presented here are in agreement to those 
found by R2012, who applied a diffusion map and random 



13 This plot was constructed with the goal of maximizing SC, 
however, we also applied the cross-validation process of section 
|3.2| aiming at maximum FoM and the results are pretty similar. 



Ve remind the reader that the SNR selection cuts are applied 
to both, training and test sample. 
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forest algorithm to the same data set. Using the spectro- 
scopic sample as given in the post-SNPCC as a training set, 
they found 56%/48% for pur/effe values. Our analysis for 
Dg+SNRO, which imposes no SNR selection cuts, returns 
43% pur and 35% effe. In their scenario achieving higher 
purity, they report 90% pur and 8% effe from a redshift 
limited training sample (R2012, Table 6). For D 8 +SNR5, 
our method achieved 98% pur and 7% effs. However, we 
emphasize that while R2012 uses a different prescription for 
constructing the training sample, our results were reached 
using a subset of the spectroscopic sample as it is presented 
in the SNPCC. 

Focusing in sample S m ,25 of R2012 and cadl+SNR5 
of our method, the first feature to call attention is the ex- 
ponential decay in our results for efltB. It will be clear in 
what follows that this is a consequence of SNR cuts (figure 
Ell. In this particular case, we imposed each filter should 
observe at least 3 epochs with SNR^5 and, with higher red- 
shift, SNe fulfilling this requirement become rare. Also, in 
the present analysis, we keep only SNe with observations in 
all available filters, which prevent us from classifying any 
object with z ^0.8 (see upper redshift end of our results 
in figure [4]| . Obviously these are not intrinsic characteristics 
of the method, or the data, but choices we made in order 
to keep results in a conservative perspective. Nevertheless, 
our values for efTe are comparable to those of R2012 up to 
z « 0.4 (see figure 10 of R2012). 

As a consequence, despite the loss in efficiency for the 
reasons cited above, the local maximum in FoMb achieved 
by both groups, us and R2012, are FoMb ~ 0.5, with our 
method providing higher results up to z ~ 0.5. 

It was not our purpose to construct a different obser- 
vation strategy, but instead, to show that if a photometric 
survey was able to provide a sample similar to post-SNPCC 
today, it is possible to extract a photometric classified set 
containing approximately 15% of the entire sample (more 
than 2000), with SCJ?90% . Beyond that, such results can 
be achieved with minimum astrophysical input and no a 
priori hypothesis about light curve shape, colour, SNe host 
environment or redshift. 

Results from Linear PGA 

Given the wide spread use of linear PCA in astronomy, 
we also verified how the standard linear version of PCA per- 
forms in the SNe photometric classification problem. The 
method described in section |2.1| was applied to the post- 
SNPCC data. Once the PCs and projections were calcu- 
lated, we used a cross-validation algorithm similar to that 
presented in section [3T2] The main difference being that, in 
the linear case, there is no parameter a to determine. 

We present results for D\ in appendix [B] As expected, 
when no SNR cut is applied, linear and KPCA achieved 



displayed in figures [4] (for KPCA applied to Z>i+SNR5) and 
B2 (for the linear case). 



similar rates of pur and effA (table Bl I. However, when data 
quality increases, linear PCA is not able to take advantage 
of the small details introduced in the light-curve function. 
Results from linear PCA applied to Di+SNR5 and including 
U in training achieved maximum values of 73% pur, 56% 
eff A , and 79% SC. Comparing tables|Bl]and|F2] we find that 
using KPCA for such a case improves results of pur, effA, and 
SC by 25%, 50% and 15% respectively, over the linear PCA 
outcomes. The dependence of these results with redshift are 



5.3 A tougher scenario 

In order to make a harder test in the classification power 
of KPCA+1NN, we used MLCS2k2 light curve fitter within 
SNANA to exclude easily recognizable non-la light curves 
from the test sample. Once the "obviously" non-la are elim- 
inated from the test sample, we were left with a data set 
containing light curves more similar between each other. If 
we are able to improve the MLCS2k2 successful classifica- 
tion rates within this sub-sample, we can be sure that the 
algorithm is doing more than just identifying very strange 
light curves. We shall see this is the casq 15 | . 

We begin by choosing a selection cut. For each light 
curve surviving this cut we calculated the fit probability of 
being a SNe la (FitProb) as implemented in SNANA. Those 
with FitProb>0.1 were tagged as la and the remaining ones 
were classified as non-la. Figure [6] shows the number of SN 
according to the calculated FitProb for 4 different selection 
cuts. Beyond the 3 SNR cuts mentioned previously, we also 
analysed the outcomes of those used by the SNANA cuts 
entry submitted to the SNPCC ( |Kessler et al.|2010[ ). These 
are defined as: at least 1 observation epoch before maxi- 
mum brightness, at least 1 epoch after +10 days, at least 1 
epoch with SNRJJlO and filters {r,i} should have maximum 
SNRJS5. Panels also show results for pur, ppur, effA, effa, 
FoMa and FoMb obtained from classifying the entire sam- 
ples according to FitProb. In this plot, it is evident that, no 
matter which selection cut we choose, there is a high con- 
centration of SNe with FitProb<0.1. This reflects the fact 
that such group of high quality non-la light curves are most 
obviously different from standard SNe la, and was respon- 
sible for a significant part of our SC rates in the previous 
sub-section. Analysing the efficiency values, we see that only 
~ 10% of type la SNe are wrongly classified as non-la ac- 
cording to the FitProb criteria. 

We now separate only the SNe classified as la according 
to the FitProb criteria for each selection cut and consider 
these our entire test sample. After that, we re-calculated the 
FitProb results and ran the KPCA+1NN classifier. For the 
SNANA cuts entry, no extra SNR cuts were applied. Re- 
sults for different time windows are shown in figure [7] From 
this plot, it is evident that, when no SNR cuts are applied, 
both methods return very similar results for pur. The FoMa 
obtained from the FitProb criteria is higher than those ob- 
tained with our method, due the their maximum efficiency 
in this context (all SNe tagged as la). The main difference 
appears when results for higher SNR are compared. For 
SNRJS 3, our method is able to increase pur results from 
pur~ 70% to pur> 90% without using any kind of astro- 
physical information. 

In order to have a better idea of how demanding the 



15 We emphasized that the reader should not consider the clas- 
sification results using our method and those based on MLCS2k2 
in the same grounds. The procedure used to obtain FitProb val- 
ues uses information about spectroscopic redshift, and as a con- 
sequence, it cannot be considered a photometric classification 
method. 
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Figure 8. Classification results from pre-maximum observa- 
tions with SNR^5 (D5+SNR5) and considering U as a clas- 
sification type. The colour code is the same used in figure [3] 
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Figure 9. Classification results for ti ow 
last point before maximum brightness ([— 10, 0[+SNR5) and 
considering U as a classification type. The colour code is the 
same used in figure [3] 




Figure 10. Results for pur, effA, SC and FoM as a function 
of rcdshift for pre-maximum data (D5+SNR5) and including 
U class in the training sample. The top right panel also shows 
the fraction of SNe classified as U. The colour code is the same 
used in figure [4] 



Figure 11. Rcdshift dependence results for [— 10, 0[+SNR5 
and including U class in the training sample. The panels show 
the same quantities described in figure [To] The colour code is 
the same used in figure [4] 
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time sampling is on the SNe la sample which already passed 
the selection cuts, we show in the bottom panel of figure [7] 
the fraction of SNe la that fulfils such requirements. These 
results are quite similar and almost independent of selection 
cuts. For D\ to Da around 70% of SNe la were classifiable 
and for Dj and D$ around 60%. 



5.4 Pre-maximum observations 

We also explored the ability of KPCA+1NN to classify 
light curves given only observation epochs before maximum 
brightness. A proposal that was submitted to the partic- 
ipants of the SNPCC but did not received any reply. Al- 
though such kind of analysis do not produce a SN sample 
useful for cosmology, it is extremely important in pointing 
candidates for spectroscopic follow-up. 

In a first approach, the light curves were treated as de- 
scribed in section [4] Once the spline fitted functions and 
time of maximum were obtained, we constructed the data 
matrix, G, with time sampling between -10 e days since 
maximum brightness (D5 and D§ in table [T]). We emphasize 
that this scenario uses points after maximum in order to de- 



Training ( spec ) 



post-SNPCC 



la test 
( photo) 



termine t n 



but not in the construction of matrix G. The 



more realistic situation, where the points after maximum are 
not used in any step of the process is also analysed bellow. 

For £> 5 +SNR5 and D5+SNRO, results are shown in fig- 
ure [8] and [12] respectively. Figure [8] is similar to figure [3] in 
the sense that both present a clear separation between la 
and non-la points in the training sample and the la in the 
test sample seem to obey that boundary (upper panels). On 
the other hand, when non-la points from the test sample 
are superimposed, they occupy almost the entire populated 
region of the parameter space. 

In figure[l2]the situation changes completely. The effect 
mention previously, describing data vectors corresponding 
to low information content localized close to the origin in 
PC space, is translated into an over-density of points in this 
area. Beyond that, we also see that the difference between 
the la and non-la distributions are not that clear any more. 
There is a slightly tendency of the non-la points agglomerate 
along the vertical axis, but this entire area is also occupied 
by la. The plot also states that the amount of relevant in- 
formation contained in la input vectors is larger than that 
in non-la, since the spread in the first is much larger than 
the second. Classification results for D 5 +SNR5 (D5+SNRO) 
achieved 61% (38%) pur, 73% (44%) eff A and 83% (58%) 
SCj^J which leads to a FoM A of 0.25 (0.07). 

We now turn to a more restrict situation. Although very 
promising, results for D5 and D q were not obtained using 
strictly only pre-maximum data, since the entire light curve 
was used to determine £ max (section j4j). In order to analyse 
a more realistic scenario, we also studied the classification 
outcomes when points after maximum are removed from the 
process of determining i max . 

For each light curve in the post-SNPCC we took just 
epochs observed before the simulated time of maximum 



16 It is important to emphasize that, given the training sample 
contains much more non-la than la, a 50% SC does not correspond 
to the outcomes of a random decision making process. 




21 % classified as U 
e£fa 44% 



D s + SNR s 
include U in training 



non — la test 
( photo) 




17 % classified as U 
pnr 37 % 
SC 58 % 
-0.5 



0.0 
PCI 



Figure 12. Classification results from pre-maximum observations 
for D5+SNRO. The colour code is the same used in figure[3] 



brightness^] The spline fit was then applied to these data 
points and the time of maximum is defined by the r-band 
as before. If in any other filter the last observed data point 
correspond to an earlier epoch them t max in r-band, we ex- 
trapolated the light curve function until it reaches t max . We 
performed classification for A = 1,3 and in both cases £i ow 
was kept as —10. After the curves were obtained, we followed 
the construction of the data matrix G and the KPCA+1NN 
algorithm as explained before. In what follows, these data 
sample is tagged as [— 10, 0[. 

Differences between the time of maximum brightness 
determined using the entire light curve and using only pre- 
maximum data are shown in figure [13] Classification results 
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for [-10,0[+SNR5 are shown in figure [9] (A = 1) and 
(A = 3) and numerical results for other cases are displayed 
in table |F2] Comparison with results from D5-I-SNR5 (figure 
[8]) shows that, although pur and efficiency remain almost 
unchanged, there is a larger number of non-la classified as 
U. The U type SNe, in this case, acts like a barrier between 
la and non-la regions, such that expanding non-la cover area 
(adding data a little more noisy) makes them being classified 
as U before pur levels are diminished. However, this barrier 
only works up to a certain point. 

Classification results for D$ (figure 14 1 and [—10, 0[ with 
A = 3 (figure 151, both satisfying SNR^5, reflect this point. 



The determination of the time of maximum brightness is the 
only difference between these two data sets, and yet, it is al- 
ready enough to lower the classification results significantly. 
A feature that was not verified among the Di samples (fig- 
ure [5j. This demonstrates the importance of a correct deter- 
mination of the time of maximum brightness. The redshift 
dependent results for these 2 instances of the data are dis- 



17 SNANA variable: SIM_PEAKMJD. 
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Figure 14. Classification results for De+SNR5. The colour Figure 15. Classification results for [— 10, 0[+SNR5 with A : 



code is the same used in Figure [3] 



3. The colour code is the same used in Figure [3] 




Figure 16. Results for effA, pur, FoM and SC as a function 
of redshift for £>6+SNR5. The colour code is the same used 
in Figure [4] 



Figure 17. Results for effA, pur, FoM and SC as a function 
of redshift [-10, 0[+SNR5 and A = 3. The colour code is the 
same used in Figure [4] 
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of Kessler et al. (20101 are still part of this sample. How- 
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Figure 13. Number of SNe as a function of the difference be- 
tween the time of maximum brightness determined using the full 
light-curve (samples Di) and using only points before maximum 
brightness ([— 10, 0[). The upper panel shows histogram for the 
training sample and the lower panel corresponds to test sample 
outcomes. 



played in figures [16] (D 6 +SNR5) and [17] ([-10, 0[+SNR5, 
with A = 3). 

These results are very encouraging. It means that, in 
the context of future DES data, the algorithm can correctly 
classify approximately 75% of the initial data sample using 
only pre-maximum data, if the entire data set was given 
at once. But in a real situation this can be improved. 
Suppose that initially, our training sample is composed 
by the spectroscopic SNe sample available today. As time 
goes by and pre-maximum light curves are observed, they 
are automatically classified. An example strategy would 
be to target with spectroscopic observations the light 
curves whose projections in PC feature space lay in the 
boundaries of the SNe Ia/non-Ia regions. Once the SNe 
type is confirmed, it can be added to the training sample, 
improving future classification results. 



6 SNPCC SAMPLE 

In order to allow a direct comparison of our results 
with those reported in the SNPCC, we also applied the 
KPCA+1NN algorithm to the data set used in the competi- 
tion. This consists of 20216 simulated light curves of which 
1105 represent the spectroscopic sample. This data can be 
consider less likely to represent the future DES data, given 
that all bugs listed as fixed "after SNPhotoCC" in table 4 



ever, the application is instructive to have an idea of how 
our method performs when faced to other algorithms. 

Results for FoM B , eff B , ppur (W=3) and pur (W=l) 
are shown in figure [El] for different SNPCC sub-samples and 
SNR cuts. This should be compared to figure 5 of |Kessler| 
et al. (20101, which reports results from different classifiers 



without using host galaxy photometric redshift. A detailed 
analysis of the multiple panels in figure |E1| is presented in 
appendix [E] 

Our findings from this sample can be summarized 
through the items bellow: 

• There is a weak dependence of the overall classification 
results with particular time sampling choices. The only eye- 
catching difference comes from time window including the 
second maximum in the infrared (Dr). 

• Results are highly dependent on SNR cuts, specially 
efficiency and consequently, FoM. 

• D 7 +SNR5 achieved FoM B > 0.25 for 0.2sC z < 0.4. A 
result only achieved by 3 of the entries participating on the 
SNPCC (namely Sako, JEDI-KDE and SNANA). 

• Our method achieved outstanding pur and ppur results 
for zJsO.2. In this redshift range, all samples with SNR^5 
reported pur values larger than 75%: a result that was not 
obtained by none of the SNPCC entries. Particularly, in 
0.2^z<0.4, D 7 +SNR^5 obtained 94% sC pur < 97%, while 
keeping a moderate FoMb- The redshift dependence of these 
results are displayed in figure [T8[ 



7 CONCLUSION 

Current SNe surveys already have at hand much more SNe 
light curves than it is possible to spectroscopically confirm. 
This situation will increase tremendously in the next decade, 
which makes SNe la photometric identification a crucial is- 
sue. In this work, we propose the use of KPCA combined 
with k = 1 nearest neighbour algorithm (KPCA+1NN) as 
a framework for SNe photometric classification. 

Lately, a large effort has been applied to the SNe photo- 
metric classification problem. An up to date compilation of 



those efforts is reported in Kessler et al. ( 2010 1, known as the 



SuperNova Photometric Classification Challenge (SNPCC). 
It consisted of a blind simulated light curve sample as ex- 
pected for the Dark Energy Survey (DES) to be used as 
a test ground for different classifiers. Although there were 
some fundamental differences between the algorithms sub- 
mitted, none of the entries performed obviously better than 
all the others. After the results were reported, the organiz- 
ers made public an updated version of the simulated data 
(post-SNPCC). Both samples, SNPCC and post-SNPCC 
were analysed in this work. 

Our method fit in the class of statistical inference algo- 
rithms, according to the SNPCC nomenclature. All calcula- 
tions are done in the observer frame. There is no corrections 
due to reddening, local environment, redshift or observa- 
tion conditions and all available spectroscopically confirmed 
data surviving quality selection cuts should be used to shape 
the PCs feature space. The dimensionality reduction is per- 
formed using only spectroscopically confirmed SNe (training 
sample) and each new unlabelled light curve (test sample) is 
classified one at a time. This allow us to avoid introducing 
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Figure 18. Classification results for D7+SNR5 from the SNPCC sample (original SNPCC data set) compared to results reported by the 
group achieving highest FoM in the SNPCC (Sako). Panels show true purity, pseudo-purity and FoMb from left to right. Blue (red) lines 
correspond to results from KPCA+1NN when applied to spectroscopic/training (photometric/test) samples. Gray region correspond to 
results reported by the group which achieved the best overall classification results in the SNPCC, without using host galaxy photometric 
redshift information ( jKessler et al.|2010| . 



noisy information from non-confirmed SNe in the classifier 
training. The algorithm is built so that once a new spectro- 
scopic light curve is available or we have total confidence in 
a photometric one, it can easily be included in the training 
process, but it is not necessary to redefine the PC feature 
space every time a new point is to be classified. 

In designing our method, we prioritize purity in the final 
SNe la sample, once it is the most important characteristic 
of a data set to be use for cosmology. We also decided to 
take a conservative approach towards the unknown features 
of the data. As a consequence, no extrapolation on time or 
wavelength domain was used and we demanded that each 
SNe was observed in all available filters. As expected, these 
choices have a great impact in our efficiency results. How- 
ever, we believe that the high purity levels achieved justifies 



our choices (figure EI I, specially in a context where there are 



already observed light curves not being used for cosmology 
due to lack of classification ( |Sako et al.|201l| . 

We highlight that we chose not to include high complex- 
ity in the different steps along the process in order to keep 
focus in the KPCA performance. Although, as remarked be- 
fore, there is plenty of room for improvement. For example, 
in choosing the kernel function, the nearest neighbour algo- 
rithm degree and studying more flexible selection cuts. Such 
developments are worth pursuing, but one should also be 
aware not to fine tuning the procedure too much, so the re- 
sults will apply only to one specific data set. Quantifying 
the dependence of our results with such change of choices is 
out of the scope of this work. 

Results presented in this work show that KPCA+INN 
algorithm provide excellent purity in the final SNe la sample. 
Although a time window since maximum brightness needs 
to be defined, its width does not have a large impact in fi- 
nal classification results. On the other hand, SNR of each 
observation epoch plays a crucial role. As a consequence, 
our best results are mainly concentrated in the intermedi- 
ate range, 0.2?Cz^ 0.4. From the SNPCC sample analysis in 



these redshifts, our method returned FoMb > 0.25, using 
D7+SNR5 (figure [f8|. A result only achieved by 3 of the en- 
tries participating on the SNPCC (namely Sako, JEDI-KDE 
and SNANA). 

We also found outstanding purity and pseudo-purity re- 
sults. All samples with SNRJS5 reported purity values larger 
than 75% for z^0.2: a result that was not obtained by 
none of the SNPCC entries. Particularly, for 0.2^ z < 0.4, 
D 7 +SNR^5 obtained 94% < pur < 97%, while keeping a 



moderate FoM (figure 18 1 



Among the entries submitted to the SNPCC, only the 
InCA group used a similar approach, although by means of 
completely different techniques. The results they reported 
to the competition provide purity rates similar the ones we 
get for SNR^ 0. 

We stress that, although the comparison with the 
SNPCC results is important, it cannot be considered ex- 
actly in the same grounds as our results. First because since 
they were built with different purposes (the SNPCC aimed 
at maximum FoMs and our goal was to achieve the highest 
possible purity while maintaining a reasonable FoM), second 
because we were not time constrained as the groups taking 
the challenge and finally, we had access to the answer key 
before hand. Something the competitors did not have. How- 
ever, a strictly direct comparison with other results in the 
literature is possible through the post-SNPCC sample. 

Recently, the InCA group made public a detailed anal- 
ysis of the results achieved by their method when applied 
to the post-SNPCC data set ( |Richards et al.|2012[ ) (R2012). 
The two algorithms provided similar classification results. 
Both achieving local maximum of FoMb around 0.5, with 
our method giving better results at lower and theirs at 
higher redshifts. Averaging over the entire redshift range, we 
achieve FoMb of 0.06 and R2012 reported 0.35. R2012 also 
provides results with different spectroscopic samples, con- 
structed by re-distributing DES available follow-up time. In 
their result with highest purity, they reported 90% purity, 
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8% effe and 0.08 FoMb using a redshift limited spectroscopic 
sample. Our method provides 96% purity, 6% effB and 0.06 
FoM B for D 7 +SNR>5. 



Karpenka et al. (20121 also present results from post- 



SNPCC data. In their analysis, results from a parametric 
fit to the spectroscopic light curves are used to train a neu- 
ral network which subsequently returns the probability of a 
new object being a la. Using 50% of the initial sample as 
a training set (~10000 objects considered spectroscopically 
confirmed), they found 80% purity, 85% eff B and 0.51 FoM B . 

It is important to emphasize that the results we report 
above were achieved using a sub-set of the spectroscopic 
sample as it is given within the post-SNPCC data. This 
means that it is not necessary to tailor the spectroscopic 
sample a priori in order to get high purity results, making 
our method ideal as a first approach to a large photometric 
data set. 

In order to test the algorithm in a more restrictive sce- 
nario, we present results obtained from the post-SNPCC 
sub-sample with MultiColor Light-curve Shape (MLCS2k2) 
fit probability, FitProb> 0.1. This sample contains light 
curves very similar between each other, and represents a 
more difficult classification challenge than the complete 
SNPCC data. We show that our method is not able to do 
more than identifying the obviously non-la light curves when 
no SNR cuts are applied. However, when we compare results 
from data samples with SNRJS3, KPCA+1NN can boost pu- 
rity levels to > 95% independently of time window sampling. 

Finally, we report the first attempt in classifying the 
post-SNPCC data using only pre-maximum epochs. This 
study is very important in selecting candidates for spectro- 
scopic follow up. Using only data between -10 and days 
since maximum brightness, we obtained 63% purity, 71% 
eff A , 77% SC and FoM A of 0.26. This is a very enthusiastic 
result and reflects the vast room for improvement this kind 
of analysis may provide in different stages of the pipeline. 

We stress that the application proposed here is merely 
an example of how the KPCA+kNN algorithm might be 
applied in astronomy. Beyond the specific problem of SNe la 
photometric classification, the same procedure can be used 
to identify other expected transient sources and even to spot 
still non-observed objects among a large and heterogeneous 
data set. The projection of such objects in PCs feature space 
would occupy a previously non-populated locus, what would 
give us a hint to further investigate that particular object. 
In the more ideal scenario, when synthetic light curves from 
a non-observer object is available, a synthetic target can 
be included in the training sample, leading to a detection 
tailored according to our expectations. This provides still 
another advantage over template fitting techniques, which 
deserve further investigation. 

From what was presented here, we conclude that the 
decision of choosing one method over the other is not a 
straightforward one, but must be balanced by the charac- 
teristics of the data available and our goal in classifying it. 
Given that SNe without spectroscopic confirmation is not a 
future issue of large surveys, but a problem that is already 
present in the SDSS data ( |Sako et al.|[20TT| ), KPCA+1NN 
algorithm proved to be the ideal choice to quickly increase 
the number of SNe la available for cosmology with mini- 
mum contamination. Alternatively, it can also be used as a 
complement to other techniques in helping to increase the 



number of SNe la in the training sample. Either way, we 
have enough evidence to trust the competitiveness of our 
algorithm within the current status of the SNe photometric 
classification field. 
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APPENDIX A: BASIC PROOFS 

This appendix contain basic proofs for the statements used 
throughout the text. These are common to machine learning 
theory field, but may not be as such for the astronomy com- 
munity. They follow closely Max Welling's notes A first en- 



counter with Machine Learning and Scholkopf et al. (19961, 



which the reader is advised to check for a comprehensible 
introduction to the basic concepts used here. 

(i) All the vectors in the eigenvector space V lie in the 
space spanned by the data vectors contained in X 
Consider v a £ V, 



E 



N N 
1 \ ^ T t V -> ( T \ 



N 



(xf v a 



NX a 



id = ]T 



(Al) 



In other words, any eigenvector can be written as a linear 
combination of the vectors in X and, as a consequence, 
must lie in the space spanned by them. 

(ii) Determining equation ^ 

Consider the projected eigenvalue equations, 



t n _ , T 

Xj OV fl — A a Xj V a . 



(A2) 
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Using equations |2]| and ([H]), we have 

N N N 

T 1 \ ~> T \ "» a \ T \ "» a 



(A3) 



3=1 fc=l 



1 N 

— E Qfc [xf X j] [xjxfc] = A a ^ a? [xf Xi] 

j,fe ' (=i 

Addressing Jfy = [xf Xj] , we can write 

Ka a = X a a a where A = NX a . 



(A4) 



(hi) Determination of \\a a \\ 
The norm of parameters a a is a consequence of the normal- 
ization of the eigenvectors in V. Using equation |5|, 



^a°a° j^xfxjj — (a a ) T Ka & — 1 



iVA a (a a ) T a a = l 



(A5) 



(iv) Obtaining Kf and a$ 

We begin with the definition of the covariance matrix in 
feature space 



1 



(A6) 



we have to find the eigenvalues, A$, and eigenvectors, v#, 
which satisfy 



A<i>v<i> = Cfv<i>. 



(A7) 



Using item (ii) above, we have that all v$ can be written 
as a linear combination of the $'s. This means that we are 
allowed to consider the equivalent equations 

A*($(x t )-v, f ) = ($(x t )-Cpv 4 ), Vfc, (A8) 

with the prescription that 



(A9) 



Using equations ( A8 1 and ( A9 1 



A^E"* ($(xfc) ■ *(xj)) 



1 N ( N \ 

= jt E «* *( x *) ■ E ( $ (^) ■ $ ( x 0) (aio) 

i=l V 3=1 / 

Calling 

{KF)ij '•= (*( x ») • *( x 3')) ) (All) 

leads to 

NUK F a = K 2 F a, (A12) 
where a is a column vector. As Kf is symmetric, 

K F a = A*a, (A13) 



with Aj> = N\§. In order to obtain a<j., we only need to 
diagonalize KF. 

The normalization of a$ is achieved by requiring 

(v| • v|) = 1, Vfc. (A14) 

Through equations \A9\ and ( |A13[ ) this converts into 

N 

1 = E [«l] i [«l] j ( < E-(x l )-*(x3)) 

i,3=l 
N 

= E [«!].["*] 

i,J— 1 

= (a k ■ Kpa^j 

(A15) 



\k { k k 

= A $ a $ ■ a $ 



(v) Centralization in feature space 
Considered the centred vectors in feature space 

1 N 

<&(xO:=<&(xO E^X (A16) 



our goal now is to define the dot product matrix 



K Ftj =$(x i ) T $(x J ). 



(A17) 



In a procedure similar to (v) above, we arrive at the eigen- 
value equation 



A$a$ = KpOtg,, 



which has eigenvectors v$ and 



v$ = y^aj$(xj). 



(A18) 



(A19) 



In this case, we do not h ave the centered datajxrints rep- 
resented by equation ( A16 1, so we need to write Kf in terms 



of Kf- In what follows, consider ljj = 



Using equations (A16l and (A17l, 



K Flj = $(x l ) i $(x J 



/ 1 N 

= $(Xl) ~ N ^ <t>(X "° 

\ m=l 
/ 1 ^ 

x I ]^E $ ( X " 

\ n = l 

1 * 

= $(xO T $( Xj ) H*mf*(xj) 

m — 1 

-^E $ ( x *) T$ ( x «) 

n=l 
1 * 

+^ E *( x J T *( x ») 

1 M 

= K Flj - — E iim-KFrnj 
i=l 

x iV x N 

n=l n,m=l 



(A20) 
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Figure Bl. Classification results using linear PCA for 
Z?l+SNR5. The colour code is the same used in Figure|3] 

Considering (ljv)^ := 1/N, V {i, j}, we have the shorter 
version, 

1T F = K F -1 n K f - K F 1 N + 1 N K F 1 N . (A21) 
APPENDIX B: LINEAR PCA 

We present here the results we achieved from applying linear 
PCA to the post-SNPCC data. The procedure for deriving 
the PCs are described in subsection l2.ll The 2 PCs that best 
separate la and non-la data points were identified by using 
a cross-validation algorithm similar to the one described in 
subsection |3.2| The only difference is that, in the linear case, 
there is no parameter a to adjust. The outcomes for sam- 
ple D\ using different SNR cuts are displayed in table |B"T) 
The graphical representation of data points projections for 
the SNR^5 case is shown in figure [Bl] and the redshift de- 
pendence of the classification results are displayed in figure 

m 

Comparing results for D1+SNR5 when U class is in- 
cluded in the training, presented in Tables |B1| and |F2| the 
reader can verify that the using KPCA raises the efficiency 
levels from 56% to 84% and the purity levels from 73% to 
91%. This corresponds to approximately 50% increase in ef- 
ficiency and 25% increase in purity. 



Table Bl. Results from applying linear PCA+1NN to the post- 
SNPCC data, D\ sample. Ratios of efficiency (eff), purity (pur) 
and successful classification (SC) are reported in percentages (%). 
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Figure B2. Classification results for D1+SNR5 as a function of 
redshift using linear PCA. The color code is the same used in 
figure [4] 



that effA also suffers in high redshift due to SNe classified 
as U (thin lines). This was another choice we made in order 
to preserve purity. Although a few SNe la are lost to the U 
class (which is bad for efficiency), so are non-la that would 
easily be mistaken with SNe la (which is good for purity). 
This effect becomes clear if we compare figures |4"land|C2 
to figure |C3[ From these we see that effA gets from 89% 
(without U type) to 84% (with U type) but at the same 
time purity increased from 80% to 91%, staying above 75% 
for the entire redshift range. 



APPENDIX C: RESULTS FOR D 1 AS A 
FUNCTION OF REDSHIFT AND SNR CUTS 



Figure CI shows how the classification results for Di (test 
sample) behave as a function of redshift and SNR selection 
cuts. Figure |C2| shows SC, efficiency and FoM results nor- 
malized after election cuts. 

Examining the top-middle panel of figure |C3[ we see 



APPENDIX D: £> 8 +SNR0 CLASSIFICATIONS 



We present in figures [DTj |D2| and |D3| the classification results 
for Dg+SNR0. This is shown in order to facilitate compari- 
son with other methods from the literature which do not ap- 
ply SNR cuts. However, we emphasize that, for a given time 
sampling, this is the worst case scenario for our method. As 
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Figure D2. Classification results as a function of redshift Figure D3. Analogous of figure |D2| for non-la classifications, 
for la (Ds+SNRO), including U class in the training sample. 
The panels show efficiency, purity, FoM and SC from top to 
bottom. The colour code is the same used in figure [4] 



shown in figure [CT| the classification potential of the method 
is highly increased with better quality data (higher SNR). 



APPENDIX E: SNPCC COMPLETE RESULTS 

Figure [ET] shows detailed results obtained from the SNPCC 
sample for different time window samplings. It is composed 
by 4 big panels, each one containing plots for a diagnostic 
parameter, organized in 3 rows and 4 columns. The rows run 
through SNR>5, SNR>3 and SNR^ 0, from top to bottom. 
The left-most column in each panel show results for SNR 
cuts only. Meaning that all SNe surviving the corresponding 
SNR cut were classified as la. Other columns represent D\, 
D$ and D7, from left to right. Outcomes from D2, D& and Ds 
are similar to the ones presented in the plot, so we decided 
not to show them. 

The first thing to notice from this figure is that the time 
window sampling leads to small differences in the overall 
classification results. Obviously higher purity results comes 
from D7, the only sub-sample which includes the second 
maximum in the infra-red, for SNe la in 2 ^ 0.8. However, 
discrepancies between results from different SNR cuts are 
much larger. This shows that, despite the need to define a 
time window, the specific choice is not crucial in the deter- 
mination of final results. 

The same argument does not hold for SNR selection 
cuts. We see the crucial role played by the quality of each ob- 
servation, no matter which diagnostic we analyse. Although 
this effect is noticeable in all of them, it is more evident 
in outcomes from effB and FoMb, due to reasons already 
discussed in section [5] Nevertheless, our method achieved 
FoM B > 0.25 for z < 0.25. In this redshift range, only 
SNPCC entries Sako, JEDI-KDE and SNANA cuts reported 
comparable results. The behaviour of our effe plots is almost 
opposite to what is reported from the SNPCC. In those, the 
efficiency is almost always very high, what frequently comes 
accompanied by a low purity result. 



On the other hand, our results for purity and pseudo- 
purity are very good, specially for redshifts within [0.2,0.5]. 
For all sub-samples with SNRJS5, we achieved purity val- 
ues larger than 75% in this redshift range, a result that is 
not present in none of the entries in the SNPCC. Beyond 
that, Z?7+SNR^5 gives good results for purity and pseudo- 
purity for z J? 0.5, confirming the importance of observing 
the second maximum in the infra-red. 



APPENDIX F: SUMMARY TABLES 

We present bellow complete tables describing our results for 
different light curve time samplings and SNR cuts. 
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Table Fl. Number of SNe in each post-SNPCC subset. The table also shows subsamples of the Di and [—10, 0[ according to SNR cuts. 
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Figure CI. Test sample classification results of efficiency, purity, 
FoM and SC for D± as a function of redshift. The orange (dot- 
dashed), brown (dashed) and green (dotted) lines correspond to 
SNR>5, SNR^3 and SNR^O, respectively. 
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Table F2. Summary of classifications results for post-SNPCC data. Ratios of efficiency (cSa/cSb), purity (pur) and successful classifi- 
cation (SC) are reported in percentages (%). 
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Table F3. Number of SNe in each SNPCC subset. The table also shows sub-sarnplcs of the Di according to SNR cuts. 
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Table F4. Summary of classifications results for SNPCC sub-samples. Results for efficiency before (effe) and after (eff^t) selection cuts, 
purity (pur) and successful classification (SC) are reported in percentages (%). 
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Figure C3. Results from the post-SNPCC data for pur (top- 
right), effA (top-middle), effe (top-right), SC (bottom-left), 
FoMa (bottom-middle) and FoMb (bottom-right) as a function 
of redshift for Di+SNR5 and including U class in the training 
sample. The color code is the same used in figure [4] Top-left and 
top-middle panels also show values of pseudo-purity and the per- 
centage of SNe classified as U (thin lines, blue for training and 
red for test sample), respectively. 
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Figure Dl. Classification results for Dg+SNRO, including U 
class in the training sample. The color code is the same used 
in figure [3] 
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